August 30, 2004 

  PCI Express 

Introduction

In my editorial previewing DAC I identified the SIGGRAPH
International Conference on Computer Graphics and Interactive
Techniques as a comparable event in so far as it is a high tech show
targeted at professionals (digital content creators and graphics
professionals) and has a parallel conference with high quality
tutorials, panels and technical papers. This year the show was held
in Los Angeles from August 8th to 12th drawing 27,825 professionals
including yours truly. SIGGRAPH (Special Interest Group for
Graphics) is part of the Association for Computing Machinery (ACM).
Decades ago the SIGGRAPH show was dominated by CAD companies and by
vendors of plotters, monitors and workstations. A lot has changed
over the years. This year's show highlights the graphics-related
products, tools, and technologies (current and future) used to
create feature films, television programs, commercials, music and
corporate videos, game production, web design and interactive web
streaming. Hardware exhibitors included firms with equipment for
motion capture, scanning, video effects, digital video, graphics
boards and processors. Software exhibitors included firms with
offerings for 3D modeling, animation, video production,
visualization, rendering, streaming, video encoding and compression
and visual effects.

We are all familiar with the stunning visual effects and imagery
that are commonplace in movies, television and on the web. Because
of the nature of the show there was an electronic theater, an
animation theater, a cyberfashion show and an art gallery, all
showing a blend of art and technology. Walking around the exhibit
floor was like watching movie trailers in a movie theater.
Impressive demonstrations were given of the how these effects are
produced by the likes of Discreet (3dsmax), Alias (Maya), Pixar
(Renderman) and Avid/SoftImage (XSI). Clearly these software
packages require significant graphic and computational horsepower.
The computer graphics industry has come a long, long way since the
days of Tron (1982) and the Norelco shaver commercial based upon
technology from MAGI (Mathematics Application Group Inc.)

Leading vendors of graphics chips and graphics accelerator boards
including NVIDAI, ATI and 3D Labs were also exhibiting. All three
made significant product announcements during SIGGRAPH.

NVIDIA 

NVIDIA Corporation was founded in 1993. The company designs,
develops and markets graphics processing units (GPUs), media and
communications processors (MCPs), ultra-low power media processors
(UMPs), and related software. These products have been incorporated
into a wide variety of computing platforms, including consumer PCs,
enterprise PCs, notebook PCs, professional workstations, handhelds,
and video game consoles. NVIDIA is headquartered in Santa Clara,
California and employs more than 2,000 people worldwide. In 2003
NVIDIA had revenues of $1.82 billion: NA $405, AP $1.26 and Europe
$106 million with 4 customers accounting for 60% of revenue.

On August 9th NVIDIA introduced its Quadro FX 4400 model, part of a
distinctive new family of professional graphics products and based
on the PCI Express bus architecture. The new products include: FX
4400 the performance leader (135 million triangles/sec, 6.4 billion
texels per sec fill rate) featuring 512MB of G-DDR3 frame buffer
memory, a 256-bit memory interface, 35.2GB/sec of memory bandwidth,
3-pin stereo support, and dual DVI display connectors; FX 4400G
delivering genlock and framelock capabilities; FX 1400 the popular
price-performance model; and FX 540 with a high definition component
output support for video previewing and recording.

By using an innovative PCI Express high-speed interconnect (HSI), a
complex piece of networking technology that performs seamless,
bi-directional interconnect protocol conversion at incredible speed
lines, NVIDIA can transform its award-winning GeForce FX series into
a full-family of PCI Express GPUs. Using this approach allows the
firm to manufacture one GPU with support for two interfaces: PCIe
and AGP. In the NV3x family, AGP is supported natively. The HSI
bridge will provide quick access to PCI Express operability. For the
NV4x family with PCI Express support, the same HSI bridge can be
reversed for an AGP variant of the board.

The NVIDIA Quadro models support SLI (Scalable Link Interface) a new
technology that enables two Quadro FX graphics boards to operate in
a single workstation. SLI is based on an intelligent communication
protocol embedded in the GPU and a high-speed digital interface on
the graphics board to facilitate data flow. An extensive suite of
software provides dynamic load balancing, and advanced rendering and
compositing to ensure smooth frame rates and image quality.

NVIDIA and notebook manufacturers have co-designed MXM (Mobile PCI
Express Module) interface to provide a consistent interface for
mobile PCI Express graphics. The MXM initiative supports a wide
range of graphics solutions from any GPU manufacturer.

ATI Technologies Inc.

Founded in 1985, ATI Technologies Inc. is a leader in the supply of
graphics, video and multimedia solutions for desktop personal
computers, mobile computing, DTV, cell phones, handhelds, consoles
and workstation products. ATI Technologies comprises three core
business units: Desktop, integrated and mobile and consumer. ATI
employees 2,200 people and in 2003 had revenues of $1.38 billion:
Canada $20, US $285, Europe $113, and AP $993 million; Components
$962, Boards $397, and Other $25 million.

On June 1st ATI announced a complete line-up of FireGL workstation
graphics accelerators with full native support for PCI Express bus
architecture. The newly branded Visualization series will offer four
new graphics accelerators; the high-end FireGL V7100, mid-range
FireGL V5100, and redefining the entry-level segment, the FireGL
V3200 and the FireGL V3100. With 256 MB of memory, and a visual
processing unit (VPU) architected with a 16 pixel pipelines and six
vertex processors, the FireGL V7100 doubles the rendering power of
ATI's previous high end FireGL products. The FireGL V7100 supports
dual DVI configurations, as well as dual link, the ability to
support ultra high end resolution 9 megapixel displays.

On August 13th ATI announced that it had shipped its one millionth
native PCI Express visual processing unit (VPU), speeding the
industry transition to PCI Express.

3Dlabs was acquired by Singapore multinational Creative Technology
Ltd. in early 2002 for $104 million and now operates as a wholly
owned subsidiary. Creative was founded in Singapore in July 1, 1981.
Product segments include audio (Sound Blaster), speakers, personal
digital entertainment communication and graphics. For 2003 Creative
had revenue of $702 million. Sales of graphics products are roughly
10% of revenue.

On June 15th 3DLABS announced the PCI Express-based Wildcat Realizm
800. The Wildcat Realizm 800 features a unique Wildcat Realizm
Vertex/Scalability Unit (VSU) and dual Wildcat Realizm Visual
Processing Units (VPU) to deliver over 700 GFLOPS of floating-point
graphics processing. These work together to enable a
software-compatible family of graphics accelerators ranging from a
single VPU AGP 8x solution to a unique dual-VPU configuration, which
takes full advantage of the enhanced bandwidth of PCI Express. The
Wildcat Realizm VSU receives graphics commands at full bandwidth
from a 16-lane PCI Express interface and processes vertices with 67
billion floating point operations per second in a powerful SIMD
array of highly optimized vector processors. The VSU is then able to
drive two VPUs at full bandwidth over a 8.4GB/sec interface while
optimally distributing graphical primitives between the two VPUs to
achieve a genuine doubling of both geometry and fill-rate
performance. The Wildcat Realizm 800 is slated for availability in
the third calendar quarter of this year at an MSRP of US$2,799.


OpenGL

Graphic Processing Units are programmable. The industry standard for
this purpose is OpenGL. The OpenGL API (Application Programming
Interface) began as an initiative by Silicon Graphics Inc. (SGI) to
create a single, vendor-independent API for the development of 2D
and 3D graphics applications. The specification was largely based on
earlier work on the SGI IRIS GL library. SGI produced a sample
implementation that hardware vendors could use to develop OpenGL
drivers for their hardware. The sample implementation has been
released under an open source license. Modifications to the OpenGL
API are made through the OpenGL Architecture Review Board.

The OpenGL Architecture Review Board (ARB), an independent
consortium formed in 1992, governs the OpenGL specification.
Composed of many of the industry's leading graphics vendors, the ARB
defines conformance tests and approves new OpenGL features and
extensions. As of October 2003, voting members of the ARB include
3Dlabs, Apple, ATI, Dell Computer, Evans & Sutherland,
Hewlett-Packard, IBM, Intel, Matrox, NVIDIA, SGI, Sun.

Many OpenGL extensions have been defined by vendors and groups of
vendors. The OpenGL Utility Library (GLU) provides many modeling
features, such as quadric surfaces and NURBS curves and surfaces.
GLU is a standard part of every OpenGL implementation. Also, there
is a higher-level, object-oriented toolkit, Open Inventor, which is
built atop OpenGL, and is available separately for many
implementations of OpenGL.

The figure below gives an overview of the traditional graphics
pipeline. Geometry (vertices, lines, polygons) and pixel data
(pixels, images, bitmaps) take different routes.
  
OpenGL Pipeline Architecture  

All geometric primitives are described by vertices. Even parametric
curves and surfaces can be mathematically defined by a net of
control points. Evaluators are used to calculate vertex properties
such as surface normal, texture coordinates, colors, and spatial
coordinate values. Per vertex operations include transformations
(scaling, translation, rotation) and projections from 3D to 2D.
Advanced operations for lighting and texture may also be applied.
Primitives (lines, polygons, bit maps) are then assembled and
clipped. Rasterization is the conversion of both geometric and pixel
data into fragments. Each fragment square corresponds to a pixel in
the framebuffer. Line and polygon stipples, line width, point size,
shading model, and coverage calculations to support antialiasing are
taken into consideration as vertices are connected into lines or the
interior pixels are calculated for a filled polygon. Color and depth
values are assigned for each fragment square. Blending, dithering,
logical operation, and masking by a bitmask may also be performed.

Graphics Application (Digital Content Creation, CAD/CAM, ..) vendors
build sophisticated software on top of OpenGL with interactive GUIs
to define, edit and display models, animations and videos. Complex
operations can be invoked by drag and drop techniques.

PCI Express Intro

All of the graphics products described above have in common that
they are PCI Express based. Jim Pappas, Director of Initiative
Marketing for Intel's Enterprise Platform Group, says "The graphics
industry is expected to make a rapid transition to PCI Express
taking advantage of the technology's increased performance
characteristics".

According to Jen-Hsun Huang, president and CEO at NVIDIA "The PCI
Express transition is going to be an exciting time for the PC
industry, stated. By aligning ourselves closely with Intel and
helping define this new specification, we were able to engineer an
innovative protocol engine, in HSI, that delivers the full-PCI
Express feature set without any compromises. HSI and PCI Express
will enable a new level of performance for high bandwidth
applications like graphics and networking."

" Since the outset of the PCI Express initiative, our aim was to
deliver a top-to-bottom family of PCI Express graphics cards to our
OEM customers, hastening the PCI Express transition," said Rick
Bergman, Senior Vice President Marketing and General Manager,
Desktop, ATI Technologies. "PCI-E is a major innovation in the
computer architecture and there is a rapid transition in the market
to this bus standard"

Note that while ATI and NVIDIA agree on the importance of PCI
Express, the two companies are initially supporting PCI Express in
very different ways: ATI will provide PCI Express compatibility with
a new line of GPUs that offer native PCI-E support, while NVIDIA's
first PCI Express efforts will use a High-Speed Interconnect (HSI)
bridge chip to graft AGP GPUs to the PCI-E interface. This enables
them to maintain parallel AGP and PCI-E GPU lines. The two firms are
arguing in public over the strengths and weakness of the bridge
approach.

Before describing PCI Express, we should review PCI that it is
replacing.

PCI

Formed in 1992, PCI-SIG (originally formed as the Peripheral
Component Interconnect Special Interest Group) is the industry
organization chartered with the development and management of the
PCI bus specification, the industry standard for a high-performance
I/O interconnect to transfer data between a CPU and its peripherals.
The PCI-SIG currently has more than 800 member companies.

The PCI (Peripheral Component Interconnect) bus structure introduced
in 1992 has been the mainstay for over a decade. The original
33-MHz, 32-bit implementation delivers a peak theoretical bandwidth
of 133 megabytes per second. Later generation of backwards
compatible PCI bus specifications emerged to improve performance
including a more recent 64-bit, 66MHz combination with a bandwidth
of 512MB/s. PCI-X 1.0 with a maximum clock speed of 133 MHz was
developed to increase the bus speed, reduce latency and improved
protocols by doubling the bus width from 32 bits to 64 bits. PCI-X
2.0 specification extends the bus frequency to 266 MHz and 533MHz
and adds advanced features like ECC.

Also introduced was the Accelerated Graphic Port (AGP)
specification, which defined a dedicated high-speed PCI bus for
graphics operations. The AGP bus offloaded graphics traffic from the
PCI system bus and freed up bandwidth for other communications and
I/O operations. The initial version of AGP was a 32-bit bus running
at 66 MHz with a peak data transfer rate of 266 MB/s. AGP has
evolved to AGP2X, AGP4X, and finally today's AGP8X, which operates
at 2.134 gigabytes per second (GB/sec). In addition, Intel recently
added dedicated USB 2.0 and Serial ATA links to the Southbridge in
its chip sets, further reducing the I/O demands on the PCI bus.
  
Comparison of Bus Architecture Performance  

The PCI architecture is shown in the diagram below.
  
PCI Architecture  

Note the Host Bridge is often referred to as the Northbridge, while
the I/O Bridge is referred to as the Southbridge. The Northbridge
connects to fastest devise, namely the CPU, memory and graphics. The
Southbridge bridge routes traffic from the different I/O devices on
the system: the hard drives, USB ports, Ethernet ports, etc. to the
Northbridge and onto to the CPU and/or memory. Because the PCI is
not fast enough for some devices, the trends has been to attach
interfaces (SATA, USB) directly to the Southbridge. Thus we now have
collection of specialized buses of different protocols and bandwidth
capabilities.

The demands of emerging computing and communications platforms
exceed the capabilities of the traditional 32 bit, 33 MHz PCI bus.
Technical innovations such as 10 GHz+ CPU speeds, faster memory,
higher-speed graphics, gigabit networking, 1394b, and other
applications will drive the need for much greater internal system
bandwidth. For example, both 1394b and Gigabit Ethernet require
bandwidth that exceeds PCI's current shared 133MB/sec maximum
bandwidth. The general consensus is that PCI and AGP have reached
their limits while the demand for increased performance and
bandwidth only increase. The PCI bus cannot be easily scaled up in
frequency or down in voltage. In addition, the PCI bus does not
support features such as advanced power management, native hot
plugging/hot swapping of peripherals, or Quality of Service (QoS) to
guarantee bandwidth for real-time operations. Finally, all of the
available bandwidth of the PCI bus is limited to one direction (send
or receive) at a time.

PCI Express (PCEi)

PCI-SIG (the Peripheral Component Interconnect Special Interest
Group), defines PCI Express as "...an open specification designed
from the start to address the wide range of current and future
system interconnect requirements of multiple market segments in the
computing and communications industries. The PCI Express
Architecture defines a flexible, scalable, high-speed, serial,
point-to-point, hot pluggable/hot swappable interconnect that is
software-compatible with PCI."

PCI Express (formerly 3GIO) is a new I/O technology that is
compatible with the current PCI software environment. PCI Express
defines a packetized protocol and load/store architecture. Its
layered architecture enables attachment to copper, optical, or
emerging physical signaling media. PCI Express uses an embedded
clocking scheme to enable better frequency scaling and provides many
advanced features as well as innovative form factors. It can be used
for chip-to-chip and add-in card applications to provide
connectivity for adapter cards, as a graphics I/O attach point for
increased graphics bandwidth, as well as an attach point to other
interconnects like 1394b, USB 2.0, InfiniBand Architecture and
Ethernet.

Multiple point-to-point connections introduce a new element, the
switch, into the I/O system topology. The switch replaces the
multi-drop bus and is used to provide fan-out for the I/O bus. A
switch may provide peer-to-peer communication between different
endpoints and this traffic, if it does not involve cache-coherent
memory transfers, need not be forwarded to the host bridge.

The PCI Express architecture defines a high-performance,
point-to-point, scalable, serial bus. A PCI Express link consists of
dual simplex channels, each implemented as a transmit pair and a
receive pair for simultaneous transmission in each direction. Each
pair consists of two low-voltage, differentially driven pairs of
signals. A data clock is embedded in each pair, using an 8b/10b
clock-encoding scheme to achieve very high data rates. The initial
frequency is 2.5Gb/s/direction and is expected to increase with
silicon technology to 10Gb/s/direction (the practical maximum for
signals in copper)

  
PCI Express Physical Layer  

The bandwidth of a PCI Express link may be linearly scaled by adding
signal pairs to form multiple lanes. The physical layer supports x1,
x2, x4, x8, x12, x16 and x32 lane widths and splits the byte data.
Each byte is transmitted, with 8b/10b encoding, across the lane(s).
This data disassembly and re-assembly is transparent to other
layers. PCI Express provides I/O attach points for high-performance
graphics, 1394b, USB 2.0, InfiniBand Architecture, Gigabit
networking and so on.

PCI Express will be available in a number of different I/O expansion
formats, depending on the platform - notebook, desktop, or server.
Servers, which require larger bandwidths to service I/O
requirements, will have more PCI Express slots, and these slots will
provide higher PCI Express lane counts. In contrast, a notebook may
use the PCI Express architecture internally, but only expose a
single x1 lane for medium speed peripherals.

The PCI Express architecture is a high-speed, general-purpose serial
I/O interconnect that provides the bandwidth required for current
and future applications. It has already caused ripple effects as
evidenced by the actions of complementary standards organization.

ASI SIG

The Advanced Switching Interconnect Special Interest Group (ASI SIG)
is a nonprofit collaborative trade organization chartered with
providing a switched fabric interconnect standard for the
communications and compute industries.

Advanced Switching is a standards-based switched-interconnect and
data-fabric architecture based on PCI Express technology for
connecting system boards and components in future products. Advanced
Switching uses the same physical and link layers as the PCI Express
architecture to achieve widespread interoperability and availability
of technology. Together, PCI Express and Advanced Switching
technologies ensure broadly available building blocks and tools that
enable component and equipment makers to reuse technology across
multiple products, reduce design costs and shorten the time it takes
to get products to market.

Express Card

PCMCIA (Personal Computer Memory Card International Association) is
an international standards body and trade association with over 200
member companies that was founded in 1989 to establish standards for
Integrated Circuit cards and to promote interchangeability among
computer systems. In September 2003 PCMIA introduced ExpressCard
(code name NEWCARD) as a new standard for hot swappable system
modules which it believes will replace 'CardBus' as the preferred
solution for end user add-ins. Based on PCI Express architecture and
Universal Serial Bus (USB) 2.0 interfaces, ExpressCard directly
connects to chipsets removing the need for a bridge component. It
supports dual-direction, single lane PCI Express, which translates
to a peak data rate of 250MB/sec in comparison to the 132-MB/sec PC
Card standard.

There are two standard formats of ExpressCard modules: the
ExpressCard/34 module which is 34 mm wide and the ExpressCard/54
module at 54 mm width. Both modules are 75mm long and 5mm high and
both have 26 pins compared to the 68 pin card bus controllers that
it would replace. Both also put out less than 1.3 watts of
dissipation. By combining USB 2.0 and PCI Express interfaces in a
single form factor, it becomes easier to expand a machine in a
variety of ways without opening the box to gain access to slots.

EDA Vendors

The graphic chips described above are the types of semiconductors
that push the envelope of EDA toolsets. The customers of EDA vendors
will have to deal more directly with PCI Express. The list below
contains links to vendor announcements this summer related to PCI
Express.

Synopsys' DesignWare IP Core for PCI Express First to Pass PCI-SIG
Compliance Tests

Cadence Incisive Palladium System Cuts NVIDIA's Verification Time in
Half; Palladium Accelerator/Emulator Speeds Verification of NVIDIA's
Newest Graphics Processor

Synopsys' New PCI Express PHY IP Enables Lower Cost ICs

Cadence and Rambus Sign Agreements to Deliver Portfolio of
High-Speed Serial Link Solutions

Xyratex Adopts Mentor Graphics PCI Express Intellectual Property for
Advanced Switching Industry Standard

Rambus and Mentor Graphics Collaborate to Offer Interoperable PCI
Express Solutions; Proven PCI Express-Compliant Solutions Now
Available to Chip Designers

Agere Systems Introduces Advanced PCI Express(R) and Gigabit
Ethernet Interface Solutions

Weekly Highlights

NEC Electronics America and Synplicity to Co-Host Seminar on
Structured ASICs and Amplify ISSP Software

Mentor Graphics FastScan ATPG Tool Selected for UMC's 130 and 90
Nanometer Reference Flow

International Engineering Consortium's Euro DesignCon 2004 to
Feature Vast Array of Technical Papers

Precision RTL Synthesis Tool From Mentor Graphics Delivers Excellent
QoR for Designs Using Actel's ProASIC Plus Devices

Apache's Physical Power Integrity Flow Adopted by ATI Technologies

Synopsys' DesignWare IP Core for PCI Express First to Pass PCI-SIG
Compliance Tests

Mentor Graphics Adds Serial ATA IP with Acquisition of Palmchip
Intellectual Property Business

Third Annual Asia Cadence Technology Symposium - ACTS Scheduled
August 24 Through September 3

Amkor Completes Acquisition of Unitive

Gartner Says Worldwide Semiconductor Revenue On Pace for 27 Percent
Growth in 2004

ARM and Artisan Combine to Deliver System-on-Chip IP Solutions